Learning Linear Regression

您所在的位置：网站首页 › python indices › Learning Linear Regression

Learning Linear Regression

2023-04-08 18:27| 来源: 网络整理| 查看: 265

April 2, 2023 5 min read

Machine learning has become an integral part of stock market analysis and prediction. Linear Regression is a widely used algorithm for predicting stock prices. In this blog, we will discuss the Linear Regression model for predicting stock prices using the Python programming language.

What is Linear Regression

Linear regression is a type of supervised learning algorithm that makes predictions based on a linear relationship between the input variables (also known as features) and the output variable (also known as the target variable).

In the case of stock price prediction, the linear regression model is trained on historical stock price data, which includes features such as the opening price, closing price, high price, low price, trading volume, rsi, ema, atr etc for a given day. The target variable is typically the closing price, since this is the price that investors are most interested in predicting.

The linear regression model uses the training data to learn the relationship between the input variables (features) and the target variable(prediction), by estimating the coefficients of a linear equation that best fits the data. Once the model has been trained, it can be used to make predictions on new, unseen data, by simply plugging in the values of the input variables and solving for the output variable using the learned coefficients.

Preparing the Feature Dataset

Linear Regression – Machine Learning – Features and Prediction variables

The dataset used for the implementation of the model is the NIFTY_EOD.csv file, which consists of open, high, low, volume, previous close, RSI, EMA, HMA, ADX, PDI, MDI, and ATR values for a particular stock.

Download the Linear Regression Features Dataset prepared using Amibroker

Download NIFTY EOD csv data set

The dataset is split into two parts, training data, and testing data. The training data is used to train the model, and the testing data is used to evaluate the performance of the model.

The Linear Regression model is created using the LinearRegression() function from the scikit-learn library. The model is trained using the fit() method of the LinearRegression class on the training data.

Python Source code for Linear Regression Based Machine Learning Prediction

import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_squared_error, r2_score, explained_variance_score import numpy as np # Load the data df = pd.read_csv('NIFTY_EOD.csv') # Prepare the data X = df[['Open', 'High', 'Low', 'Volume', 'prevclose', 'rsi', 'ema5', 'ema10', 'hma5','hma7','hma9','adx','pdi','mdi','atr']] y = df['Close'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Train the model model = LinearRegression() model.fit(X_train_scaled, y_train) # Make predictions y_pred = model.predict(X_test_scaled) # Compute accuracy metrics mse = mean_squared_error(y_test, y_pred) print("Mean squared error: ", mse) # Save predicted vs actual values to CSV df_pred = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred}) df_pred.to_csv('NIFTY_EOD_pred.csv', index=False) # Compute accuracy metrics mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100 r2 = r2_score(y_test, y_pred) ev = explained_variance_score(y_test, y_pred) mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) mae = np.mean(np.abs(y_test - y_pred)) # Print accuracy metrics print("Mean absolute percentage error (MAPE): ", mape) print("R-squared: ", r2) print("Explained variance: ", ev) print("Mean squared error: ", mse) print("Root mean squared error (RMSE): ", rmse) print("Mean absolute error (MAE): ", mae) # Make a prediction for the next day's close price last_row = df.tail(1) last_row_scaled = scaler.transform(last_row[['Open', 'High', 'Low', 'Volume', 'prevclose', 'rsi', 'ema5', 'ema10', 'hma5','hma7','hma9','adx','pdi','mdi','atr']]) next_day_pred = model.predict(last_row_scaled)[0] print("Predicted close price for the next day: ", next_day_pred)

Python Output

Mean squared error: 16.089511432445203 Mean absolute percentage error (MAPE): 0.0604587376375362 R-squared: 0.9999992659810478 Explained variance: 0.9999992660030701 Mean squared error: 16.089511432445203 Root mean squared error (RMSE): 4.011173323660448 Mean absolute error (MAE): 2.2689889473336495 Predicted close price for the next day: 17360.236586417002

Here are the common steps involved in linear regression prediction:

Data Collection: Collect relevant data related to the problem statement. In case of stock price prediction, data such as historical stock prices, volumes, market trends, etc., are collected. Data Preprocessing: This step involves cleaning and preparing the data for analysis. It includes removing any missing values, handling outliers, scaling/normalizing the data, etc. Feature Selection: Identifying the features that are most relevant to the problem statement. In case of stock price prediction, features such as open price, close price, volume, etc., may be considered. Training the Model: This involves selecting a machine learning algorithm, such as linear regression, and training the model on the prepared data. During this step, the model learns to make predictions based on the patterns found in the data. Model Evaluation: This step involves evaluating the performance of the model on a separate set of data (testing data) that was not used during training. Common evaluation metrics include mean squared error, mean absolute error, and R-squared. Hyperparameter Tuning: The performance of the model can be improved by tuning the hyperparameters of the algorithm. This involves selecting optimal values for parameters such as learning rate, regularization, and number of iterations. Prediction: Once the model is trained and evaluated, it can be used to make predictions on new, unseen data. Model Deployment: The final step involves deploying the trained model into a production environment for use in real-world applications.

After training the model, we can make predictions on the testing data using the predict() method of the LinearRegression class. The accuracy of the model is evaluated using different metrics, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), R-squared, and Explained Variance.

In the above Python code, the metrics of the Linear Regression model are calculated as follows:

Mean squared error (MSE) is the average squared difference between the predicted and actual values. The MSE value in the code output is 16.0895, which indicates that the model has made a small error in predicting the stock prices. Mean absolute percentage error (MAPE) measures the average absolute percentage difference between the predicted and actual values. The MAPE value in the code output is 0.06045, which indicates that the model has a low error rate. R-squared measures how well the model’s predictions fit the actual values. It ranges from 0 to 1, with higher values indicating a better fit. The R-squared value in the code output is 0.9999, which indicates that the model has an excellent fit. Explained variance measures the proportion of variance in the target variable that can be explained by the model. It ranges from 0 to 1, with higher values indicating a better model. The Explained variance value in the code output is 0.9999, which indicates that the model is highly accurate. Root mean squared error (RMSE) is the square root of the mean squared error and gives a sense of how much the predictions deviate from the actual values. The RMSE value in the code output is 4.0111, which indicates that the model has a low error rate. Mean absolute error (MAE) measures the average absolute difference between the predicted and actual values. The MAE value in the code output is 2.2689, which indicates that the model has a low error rate.

In addition, the code outputs the predicted close price for the next day, which is 17360.2365.

One disadvantage of the linear regression model is that it assumes a linear relationship between the input variables and the target variable. In reality, the relationship between the input variables and the target variable may be nonlinear, which can lead to poor performance of the linear regression model. Additionally, linear regression is sensitive to outliers in the data, which can skew the learned coefficients and lead to poor predictions. Finally, linear regression assumes that the input variables are independent of each other, which may not be the case in practice.

In conclusion, the Linear Regression model is a powerful machine-learning algorithm for predicting stock prices. The above Python code shows how to implement the Linear Regression model for predicting stock prices and evaluating its performance using various metrics. The accuracy of the model can be improved by using more relevant features and tuning the hyperparameters.

Rajandran R Follow Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, USDINR and High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in and Co-Creator of Algomojo (Algorithmic Trading Platform for DIY Traders) Machine Learning Python #accuracy metrics #data preparation #explained variance #Linear Regression #machine learning #mean absolute error #mean absolute percentage error #mean squared error #r-squared #standard scaler #Stock Prediction #stock price prediction #train test split « Preparing the Features Dataset using Amibroker Exploration – Machine Learning Predicting Stock Price and Market Direction using XGBoost Machine Learning Algorithm »

【本文地址】

Learning Linear Regression

Learning Linear Regression

今日新闻

推荐新闻